Skip to content

refactor(core): rename benchmark → project for registry + sync (1/4)#1242

Merged
christso merged 1 commit into
mainfrom
refactor/rename-benchmark-to-project
May 15, 2026
Merged

refactor(core): rename benchmark → project for registry + sync (1/4)#1242
christso merged 1 commit into
mainfrom
refactor/rename-benchmark-to-project

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Summary

PR 1 of 4 in the benchmark → project rename. Scope: internal @agentv/core symbols only. Wire formats (HTTP routes, JSON field keys, CLI flag descriptions, Studio routes/components, docs) are unchanged in this PR and will land in:

  • PR 2 — HTTP API routes + JSON field keys (`benchmark_id` → `project_id`, etc.) + CLI flag descriptions
  • PR 3 — Studio frontend (TanStack routes, components, hooks, types)
  • PR 4 — Docs + examples + skills cards

Renames

Before After
`packages/core/src/benchmarks.ts` `projects.ts`
`packages/core/src/benchmark-sync.ts` `project-sync.ts`
`BenchmarkEntry` / `BenchmarkSource` / `BenchmarkRegistry` `ProjectEntry` / `ProjectSource` / `ProjectRegistry`
`loadBenchmarkRegistry`, `saveBenchmarkRegistry` `loadProjectRegistry`, `saveProjectRegistry`
`addBenchmark`, `removeBenchmark`, `getBenchmark`, `touchBenchmark` `addProject`, `removeProject`, `getProject`, `touchProject`
`discoverBenchmarks`, `deriveBenchmarkId`, `getBenchmarksRegistryPath` `discoverProjects`, `deriveProjectId`, `getProjectsRegistryPath`
`syncBenchmark`, `syncBenchmarks` `syncProject`, `syncProjects`
`~/.agentv/benchmarks.yaml` (top-level `benchmarks:`) `~/.agentv/projects.yaml` (top-level `projects:`)

One-time legacy file migration

`loadProjectRegistry()` calls `migrateLegacyBenchmarksFile()` before reading the registry. Four state transitions handled:

State Behavior
Only `benchmarks.yaml` exists Read → rewrite top-level key → write temp → `renameSync` to `projects.yaml` → `unlinkSync` old. One log line.
Only `projects.yaml` exists No-op.
Both exist `projects.yaml` wins. `stderr` warning. Legacy left in place for operator review.
Neither exists No-op (fresh install).

The temp+rename pattern keeps `projects.yaml` from ever being half-written; the legacy file is only removed after the rename succeeds.

Why "project"

5 of 6 LLM observability tools (Phoenix, Langfuse, Braintrust, W&B Weave, LangSmith) use `project` for the container that holds eval runs, traces, and datasets. agentv is adding trace/span/latency capture alongside eval runs, making "benchmark" too narrow. The rename also disambiguates from the academic "benchmark = eval suite" usage that's retained in example directory names (`benchmark-tooling`, `multi-model-benchmark`, etc.) — those genuinely are benchmark suites and stay named that way.

Test plan

  • `bun run typecheck` — passes
  • `bun run lint` — clean
  • `bun run test` — 2374 tests pass (1768 core including 4 new migration tests + 67 eval + 539 cli, 0 fail)
  • `bun run build` — all packages build
  • Pre-push hooks pass (including `validate:examples` over 56 example evals)
  • Red/green UAT with a real simulated home dir at `/tmp/uat-rename-home/`:

```
$ ls /tmp/uat-rename-home/.agentv/
benchmarks.yaml # legacy fixture with 2 entries (alpha + beta with source)

$ HOME=/tmp/uat-rename-home bun -e "import { loadProjectRegistry } from '@agentv/core';
const r = loadProjectRegistry(); console.log(r.projects.map(p => p.id));"
[agentv] Migrated registry: benchmarks.yaml → projects.yaml (2 entries)
[ "alpha", "beta" ]

$ ls /tmp/uat-rename-home/.agentv/
projects.yaml # legacy file gone, content preserved including .source

Second load → silent no-op

$ HOME=/tmp/uat-rename-home bun -e "...loadProjectRegistry()..."
[ "alpha", "beta" ] # no migration log line — idempotent

Both-files conflict → projects.yaml wins, warning emitted

$ # (re-create benchmarks.yaml alongside the new projects.yaml)
$ HOME=/tmp/uat-rename-home bun -e "...loadProjectRegistry()..."
[agentv] Both .../.agentv/benchmarks.yaml and .../.agentv/projects.yaml exist.
Using projects.yaml; delete benchmarks.yaml when you've confirmed the new file is correct.
[ "alpha", "beta" ]
```

The 4 new migration tests in `packages/core/test/projects.test.ts` cover the same three transitions plus the fresh-install no-op.

Notes on what's intentionally NOT renamed in this PR

  • HTTP routes like `/api/benchmarks/...` — PR 2.
  • Wire field names `benchmark_id`, `benchmark_name` in API responses — PR 2.
  • Studio route `$benchmarkId` URL param and TanStack route files — PR 3.
  • The private `withBenchmark()` middleware in `serve.ts` — PR 2 (paired with route rename).
  • The "Multi-benchmark mode" console message and CLI flag help text — PR 2.
  • Example directories named `*-benchmark` — they're genuinely benchmark suites in the academic sense; stays as-is by design.
  • `benchmark.json` per-run metrics artifact (Agent Skills compatibility) — a different concept; separate cleanup, deferrable.

🤖 Generated with Claude Code

Internal-only rename (PR 1 of 4). The user-facing "benchmark" terminology
in HTTP routes (/api/benchmarks/...), JSON field names (benchmark_id,
benchmark_name), CLI flags, Studio components, and docs is unchanged in
this PR — those land in PR 2 (HTTP API), PR 3 (Studio frontend), and
PR 4 (docs).

Renamed:
- packages/core/src/benchmarks.ts → projects.ts
- packages/core/src/benchmark-sync.ts → project-sync.ts
- BenchmarkEntry → ProjectEntry, BenchmarkSource → ProjectSource,
  BenchmarkRegistry → ProjectRegistry
- loadBenchmarkRegistry → loadProjectRegistry,
  saveBenchmarkRegistry → saveProjectRegistry,
  addBenchmark → addProject, removeBenchmark → removeProject,
  getBenchmark → getProject, touchBenchmark → touchProject,
  discoverBenchmarks → discoverProjects,
  deriveBenchmarkId → deriveProjectId,
  getBenchmarksRegistryPath → getProjectsRegistryPath,
  syncBenchmark → syncProject, syncBenchmarks → syncProjects
- ~/.agentv/benchmarks.yaml → projects.yaml, top-level key
  `benchmarks:` → `projects:`

One-time migration:
- loadProjectRegistry() calls migrateLegacyBenchmarksFile() before
  reading the registry. If only benchmarks.yaml exists, it is
  read, transformed (top-level key rewritten), written to a temp
  file, atomically renamed to projects.yaml, and the legacy file is
  unlinked. If both files exist, projects.yaml wins and a warning
  is logged. Idempotent: subsequent loads are a no-op.

Rationale: 5 of 6 LLM observability tools (Phoenix, Langfuse,
Braintrust, W&B Weave, LangSmith) use "project" for the container
that holds eval runs, traces, datasets, and other telemetry. agentv
is adding trace/span/latency capture alongside eval runs, making
"benchmark" too narrow. The rename also disambiguates from the
academic "benchmark = eval suite" usage that survives in example
directory names (benchmark-tooling, multi-model-benchmark, etc.).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 5d84e67
Status: ✅  Deploy successful!
Preview URL: https://68177b2b.agentv.pages.dev
Branch Preview URL: https://refactor-rename-benchmark-to.agentv.pages.dev

View logs

@christso christso marked this pull request as ready for review May 15, 2026 00:29
@christso christso merged commit 66ffa92 into main May 15, 2026
4 checks passed
@christso christso deleted the refactor/rename-benchmark-to-project branch May 15, 2026 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant